9 research outputs found

    Intrinsic bias in breast cancer gene expression data sets

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>While global breast cancer gene expression data sets have considerable commonality in terms of their data content, the populations that they represent and the data collection methods utilized can be quite disparate. We sought to assess the extent and consequence of these systematic differences with respect to identifying clinically significant prognostic groups.</p> <p>Methods</p> <p>We ascertained how effectively unsupervised clustering employing randomly generated sets of genes could segregate tumors into prognostic groups using four well-characterized breast cancer data sets.</p> <p>Results</p> <p>Using a common set of 5,000 randomly generated lists (70 genes/list), the percentages of clusters with significant differences in metastasis latencies (HR p-value < 0.01) was 62%, 15%, 21% and 0% in the NKI2 (Netherlands Cancer Institute), Wang, TRANSBIG and KJX64/KJ125 data sets, respectively. Among ER positive tumors, the percentages were 38%, 11%, 4% and 0%, respectively. Few random lists were predictive among ER negative tumors in any data set. Clustering was associated with ER status and, after globally adjusting for the effects of ER-α gene expression, the percentages were 25%, 33%, 1% and 0%, respectively. The impact of adjusting for ER status depended on the extent of confounding between ER-α gene expression and markers of proliferation.</p> <p>Conclusion</p> <p>It is highly probable to identify a statistically significant association between a given gene list and prognosis in the NKI2 dataset due to its large sample size and the interrelationship between ER-α expression and markers of proliferation. In most respects, the TRANSBIG data set generated similar outcomes as the NKI2 data set, although its smaller sample size led to fewer statistically significant results.</p

    Time to Recurrence and Survival in Serous Ovarian Tumors Predicted from Integrated Genomic Profiles

    Get PDF
    Serous ovarian cancer (SeOvCa) is an aggressive disease with differential and often inadequate therapeutic outcome after standard treatment. The Cancer Genome Atlas (TCGA) has provided rich molecular and genetic profiles from hundreds of primary surgical samples. These profiles confirm mutations of TP53 in ∼100% of patients and an extraordinarily complex profile of DNA copy number changes with considerable patient-to-patient diversity. This raises the joint challenge of exploiting all new available datasets and reducing their confounding complexity for the purpose of predicting clinical outcomes and identifying disease relevant pathway alterations. We therefore set out to use multi-data type genomic profiles (mRNA, DNA methylation, DNA copy-number alteration and microRNA) available from TCGA to identify prognostic signatures for the prediction of progression-free survival (PFS) and overall survival (OS). prediction algorithm and applied it to two datasets integrated from the four genomic data types. We (1) selected features through cross-validation; (2) generated a prognostic index for patient risk stratification; and (3) directly predicted continuous clinical outcome measures, that is, the time to recurrence and survival time. We used Kaplan-Meier p-values, hazard ratios (HR), and concordance probability estimates (CPE) to assess prediction performance, comparing separate and integrated datasets. Data integration resulted in the best PFS signature (withheld data: p-value = 0.008; HR = 2.83; CPE = 0.72).We provide a prediction tool that inputs genomic profiles of primary surgical samples and generates patient-specific predictions for the time to recurrence and survival, along with outcome risk predictions. Using integrated genomic profiles resulted in information gain for prediction of outcomes. Pathway analysis provided potential insights into functional changes affecting disease progression. The prognostic signatures, if prospectively validated, may be useful for interpreting therapeutic outcomes for clinical trials that aim to improve the therapy for SeOvCa patients

    Metabolomics-Based Discovery of Diagnostic Biomarkers for Onchocerciasis

    Get PDF
    Onchocerciasis, caused by the filarial parasite Onchocerca volvulus, afflicts millions of people, causing such debilitating symptoms as blindness and acute dermatitis. There are no accurate, sensitive means of diagnosing O. volvulus infection. Clinical diagnostics are desperately needed in order to achieve the goals of controlling and eliminating onchocerciasis and neglected tropical diseases in general. In this study, a metabolomics approach is introduced for the discovery of small molecule biomarkers that can be used to diagnose O. volvulus infection. Blood samples from O. volvulus infected and uninfected individuals from different geographic regions were compared using liquid chromatography separation and mass spectrometry identification. Thousands of chromatographic mass features were statistically compared to discover 14 mass features that were significantly different between infected and uninfected individuals. Multivariate statistical analysis and machine learning algorithms demonstrated how these biomarkers could be used to differentiate between infected and uninfected individuals and indicate that the diagnostic may even be sensitive enough to assess the viability of worms. This study suggests a future potential of these biomarkers for use in a field-based onchocerciasis diagnostic and how such an approach could be expanded for the development of diagnostics for other neglected tropical diseases

    Gene Dosage, Expression, and Ontology Analysis Identifies Driver Genes in the Carcinogenesis and Chemoradioresistance of Cervical Cancer

    Get PDF
    Integrative analysis of gene dosage, expression, and ontology (GO) data was performed to discover driver genes in the carcinogenesis and chemoradioresistance of cervical cancers. Gene dosage and expression profiles of 102 locally advanced cervical cancers were generated by microarray techniques. Fifty-two of these patients were also analyzed with the Illumina expression method to confirm the gene expression results. An independent cohort of 41 patients was used for validation of gene expressions associated with clinical outcome. Statistical analysis identified 29 recurrent gains and losses and 3 losses (on 3p, 13q, 21q) associated with poor outcome after chemoradiotherapy. The intratumor heterogeneity, assessed from the gene dosage profiles, was low for these alterations, showing that they had emerged prior to many other alterations and probably were early events in carcinogenesis. Integration of the alterations with gene expression and GO data identified genes that were regulated by the alterations and revealed five biological processes that were significantly overrepresented among the affected genes: apoptosis, metabolism, macromolecule localization, translation, and transcription. Four genes on 3p (RYBP, GBE1) and 13q (FAM48A, MED4) correlated with outcome at both the gene dosage and expression level and were satisfactorily validated in the independent cohort. These integrated analyses yielded 57 candidate drivers of 24 genetic events, including novel loci responsible for chemoradioresistance. Further mapping of the connections among genetic events, drivers, and biological processes suggested that each individual event stimulates specific processes in carcinogenesis through the coordinated control of multiple genes. The present results may provide novel therapeutic opportunities of both early and advanced stage cervical cancers

    Prediction of Ischemic Events on the Basis of Transcriptomic and Genomic Profiling in Patients Undergoing Carotid Endarterectomy

    No full text
    Classic risk factors, including age, smoking, serum cholesterol, diabetes and blood pressure, constitute the basis of present risk prediction models but fail to identify all individuals at risk. The objective of this study was to investigate if genomic and transcriptional patterns improve prediction of ischemic events in patients with established carotid artery disease. Genotype and gene expression profiles were obtained from carotid plaque tissue (n = 126) and peripheral blood mononuclear cells (n = 97) of patients undergoing carotid endarterectomy. Patients were followed for an average of 44 months, and 25 ischemic events occurred (18 ischemic strokes and 7 myocardial infarctions). Blinded leave-one-out cross-validation on Cox regression coefficients was used to assign gene expression–based risk scores to each patient. When compared with classic risk factors, addition of carotid plaque gene expression–based risk score improved the prediction of future ischemic events from an area under the curve (AUC) of 0.66 to an AUC of 0.79. The inclusion of gene expression risk score from peripheral blood mononuclear cells or from 25 established myocardial infarction risk single nucleotide polymorphisms only exhibited marginal effects on the prediction of ischemic events. Prediction of ischemic events is improved by inclusion of gene expression profiling from carotid endarterectomy tissue compared with prediction on the basis of classic risk markers alone in patients with atherosclerosis. The method may be developed to identify subjects at very high risk of ischemic events
    corecore